Development of artificial neural networks for early prediction of intestinal perforation in preterm infants

Intestinal perforation (IP) in preterm infants is a life-threatening condition that may result in serious complications and increased mortality. Early Prediction of IP in infants is important, but challenging due to its multifactorial and complex nature of the disease. Thus, there are no reliable tools to predict IP in infants. In this study, we developed new machine learning (ML) models for predicting IP in very low birth weight (VLBW) infants and compared their performance to that of classic ML methods. We developed artificial neural networks (ANNs) using VLBW infant data from a nationwide cohort and prospective web-based registry. The new ANN models, which outperformed all other classic ML methods, showed an area under the receiver operating characteristic curve (AUROC) of 0.8832 for predicting IP associated with necrotizing enterocolitis (NEC-IP) and 0.8797 for spontaneous IP (SIP). We tested these algorithms using patient data from our institution, which were not included in the training dataset, and obtained an AUROC of 1.0000 for NEC-IP and 0.9364 for SIP. NEC-IP and SIP in VLBW infants can be predicted at an excellent performance level with these newly developed ML models. https://github.com/kdhRick2222/Early-Prediction-of-Intestinal-Perforation-in-Preterm-Infants.

posed to predict NEC values, and our neural approaches produced promising NEC prediction results using a dataset that included 852 positive cases. However, it is more difficult to collect data for NEC-IP and SIP than for NEC; thus, the number of datasets, especially for positive cases, is very limited (521 and 208 positive cases for NEC-IP and SIP, respectively). To solve this problem, we utilized relevant information from a neural network trained for NEC prediction to elevate the diagnostic accuracy of SIP and NEC-IP.
In this study, we introduced several deep neural networks for predicting NEC, NEC-IP, and SIP in infants by taking a 54-dimensional input vector as the network input, and the output of each network was a single value for binary classification problems. Specifically, we first developed our baseline neural network (Model 1) based on the conventional multilayer perceptron (MLP) architecture and then trained Model 1 to predict NEC, NEC-IP, and SIP separately. For ANNs, it is well known that simply adding layers can lead to performance improvement by enforcing more nonlinearity; however, it causes the overfitting problem when the training dataset is insufficient. Therefore, we stacked the layers more deeply than the conventional models in diagnosis, as introduced in a previously published study 31 , but determined the hyperparameters (e.g., the number of channels) carefully and added more advanced techniques, such as batch normalization 34 and drop-out 35 , to avoid the overfitting problem and facilitate stable training. Specifically, Model 1 is a binary classifier that is composed of 5 hidden layers; each layer is formed from a block (Fig. 1a) arranged with an activation function, batch normalization and dropout. It takes a 54-dimensional vector as an input and, after the application of all layers, renders feature vectors of dimension. Model 1 is a typical neural approach and can produce promising results when a large number of www.nature.com/scientificreports/ datasets (e.g., NEC) are available. However, the outcomes from Model 1 can be vulnerable in real-world scenarios where the training dataset is insufficient. Therefore, we attempted to further improve our baseline Model 1 to predict NEC-IP and SIP more accurately since the lack of data problem for NEC-IP and SIP is more serious than that for NEC. To alleviate this problem, we developed a new approach that transferred information from the network to predict NEC values to help estimate NEC-IP and SIP. Note that transfer learning in the field of deep learning is one of the most widely used approaches to solve lack of data problems [36][37][38][39][40] . Therefore, we present additional models (Model 2 and Model 3) that can exploit information achieved from the NEC dataset to predict SIP and NEC-IP.
Specifically, Model 2 is composed of two different MLPs. One branch predicts NEC, and the other branch predicts either SIP or NEC-IP (Fig. 1b). Notably, at the 4th layer of the MLP branch for NEC-IP and SIP in Model 2, feature vectors from the third layer of the network for NEC are fed as an additional input. By concatenating the feature vectors from NEC, we can utilize the information for NEC in predicting NEC-IP/SIP.
Unlike that of Model 2, the network architecture of Model 3 is the same as that of Model 1. We employed conventional transfer learning to utilize information from NEC to estimate NEC-IP/SIP and fine-tuned the pretrained Model 1 to the specific NEC-IP/SIP datasets (Fig. 1c).
Comparison of performance between classic ML models and proposed ANN models. We provide prediction results in Table 2 to compare traditional ML models with our neural approaches. We observed that the proposed neural approach (Model 1) outperformed traditional learning-based methods in terms of area under the receiver operating characteristic curve (AUROC) scores for all cases (NEC, NEC-IP, and SIP).
Moreover, our extended networks (i.e., Model 2 and Model 3), which we designed to mitigate the lack of data problem for NEC-IP (521 positive cases) and SIP (208 positive cases), showed improved results in predicting NEC-IP and SIP over the baseline network (Model 1).
In particular, compared to Model 1, which was trained on the NEC dataset, our proposed methods for NEC-IP and SIP (Model 2 and Model 3) exhibited improvements. Notably, Model 2 directly utilizes features distilled from Model 1, Model 3 fine-tunes Model 1 for the prediction of either NEC-IP or SIP, and Model 2 outperforms Model 3 in terms of AUROC, as shown in Table 3 and Fig. 2. The performance metrics of these models using balanced validation dataset are described in Supplementary Tables 1 and 2. This performance improvement achieved in Model 2 and Model 3 indicates that information extracted to predict NEC can also be used to predict NEC-IP and SIP more accurately. Moreover, the proposed direct feature distillation of Model 2 rather than the conventional fine-tuning approach (i.e., Model 3) can be a recommendable option for addressing problems with limited data.

Application of the new ANN models in a real clinical environment.
To assess the feasibility of the algorithm in a real clinical environment, we tested our newly developed algorithms using the patient data from our institution, which were not included in the training dataset. A total of 57 VLBW infants who were born at our hospital between 2019 and 2020 were included in this test analysis. Among the three ANN models, Model 2 achieved the highest AUROC scores: 1.0000 for the prediction of NEC-IP and 0.9364 for the prediction of SIP (Table 4 and Fig. 3).

Discussion
Our novel ML algorithms predicted NEC-IP and SIP in VLBW infants with favorable AUROC scores, outperforming all other classic ML algorithms. One of our algorithms exhibited an AUROC score of 1.0000 for predicting NEC-IP and 0.9364 for predicting SIP in real clinical settings. Our study shows the integration of a vast nationwide dataset with ML, and the resulting model can be used to predict the possibility of specific medical conditions in patients who may not perfectly represent the signs and symptoms of the disease.
Among preterm infants in the NICU, various medical problems exist at the same time, and multidisciplinary collaboration is required to make medical decisions for these patients. IP is one of the most devastating medical conditions that occurs in the NICU. Early diagnosis, swift judgment, and prompt surgical intervention are required to prevent severe complications and poor outcomes 41 . However, as we explained earlier, predicting the occurrence of IP is difficult. By running this algorithm, it is possible to analyze every preterm infant who is admitted to the NICU and predict each patient's likelihood of developing NEC-IP and SIP. Thus, early predictions of these serious medical conditions could provide clinicians with a much more stable management environment, enabling them to make better treatment decisions.
In recent years, AI and big data have been increasingly integrated into medicine because it is difficult for the unaided human clinician to acquire all the latest published knowledge, as is required by modern evidencebased medicine 30,42 . AI is an important resource for medical research in that it can efficiently process large amounts of data. Additionally, AI can produce consistent and unbiased results without fatigue. Several studies in healthcare research have reported sufficient or even better risk prediction by AI methods compared to existing models [43][44][45][46][47] . Especially these days, in solving difficult problems with big data which have complex distribution, neural approaches exhibit excellent performance. Since neural networks can have non-linearity from adding layers and fine-tune parameters by transfer learning 36,40 , they can get the upper hand in complex tasks. To sum up, ANN is an appropriate model for IP prediction, as it can handle imbalanced big data efficiently. However, even if the overall cohort is large, diseases with low prevalence always suffer from a lack of data. Due to the character of ML, it can produce excellent results only with a large amount of training data. Thus, applying ML to diseases with low prevalence remains a challenge 48,49 . To overcome the data imbalance problem, several studies have applied data processing techniques such as oversampling 50 and undersampling 51 as we did in our ANN models.
To further improve the performance of our models, we modified them by adding another branch or pretraining it based on an algorithm that predicts other relevant disease with higher prevalence in order to help www.nature.com/scientificreports/ the model predict the target disease more accurately. As a result of these adjustments, the modified algorithms (Model 2 and 3) achieved better performance than the original model. According to previous studies, NEC-IP and SIP are regarded as separate disease entities, and the pathogenesis of SIP does not appear to correlate with that of NEC 5,6,52-55 . Notably, however, training of NEC prediction improved not only the accuracy of NEC-IP prediction but also the accuracy of SIP prediction. These results show that pre-ML training with more prevalent medical conditions can help AI predict the occurrence of the target disease more accurately. Our study also highlights that it is necessary to customize the algorithm for each disease to apply an ML model in real clinical settings, especially if the disease is rare. Although our study showed a favorable outcome, it had its share of limitations. First, a limited number of factors are included because only data collected from the Korean Neonatal Network (KNN) were used. It is expected that the collection of further IP-related data, such as clinical symptoms, vital signs, and radiologic findings, will enable the model to produce better results. In addition, the limitations of AI studies, such as representation, homogeneity, and accuracy, were observed in this study. Another limitation is that it is difficult to determine how AI methods generate results due to the nature of self-extracted data from large datasets 49,56,57 .
In conclusion, we developed our own ANN models to predict IP early in VLBW infants, and these new models achieved higher accuracy than classic ML algorithms. To our knowledge, this is the first study to develop an ML model to predict both NEC-IP and SIP using nationwide VLBW infant data. In addition, the newly proposed ANN models showed excellent performance within real NICU clinical settings. When more clinical data, such as vital signs, radiologic findings, biomarkers, and laboratory results, are gathered, we believe that a more accurate ML model will be developed, thereby achieving early prediction of these serious medical conditions and better clinical outcomes for VLBW infants.

Methods
Data collection. We derived data from infants registered in the KNN, a nationwide prospective cohort registry of VLBW infants 58 . Their clinical data were collected from 74 participating NICUs across the country and analyzed retrospectively for this study. Prior to participation in the KNN registry, informed consent was obtained from the parents of each infant, and all methods were carried out following relevant guidelines. This study was approved by the Hanyang University Institutional Review Board (IRB No. 2013-06-025-043).
The cohort comprised 12,555 VLBW infants born between January 5, 2013, and December 31, 2018, weighing less than 1500×g. Disease definitions. NEC was defined according to Bell's modified staging grade ≥ II. NEC-IP was diagnosed when patients with NEC underwent any kind of abdominal surgical intervention (peritoneal drainage or laparotomy). SIP was defined when the patients underwent surgical intervention due to IP and the surgeon found no predisposing causes, such as NEC, intestinal atresia, or meconium plug. The full list of 54 variables used in ML analysis is shown in Supplementary Table 3.
Comparisons of baseline characteristics. A total of 54 variables, including various maternal and perinatal factors, were collected for ML. Among them, 18 clinical factors that were proposed as possible risk factors for either NEC-IP or SIP in previous studies were analyzed using conventional statistical methods. Student's t-test was performed to analyze the continuous variables, and the chi-squared test was used to analyze categorical variables. Statistical significance was set at P < 0.05. The Statistical Package for the Social Sciences version 22.0 for Windows software program (IBM Corp., Armonk, NY, USA) was used in all statistical analyses. Data preprocessing. Our dataset was composed of 12,555 infants in total; we divided them into training and evaluation datasets (Table 5). Moreover, to facilitate the network training procedure, a data preprocessing Table 2. Model performance of classic ML models for predicting NEC, NEC-IP, and SIP. ML machine learning, NEC necrotizing enterocolitis, NEC-IP intestinal perforation associated with necrotizing enterocolitis, SIP spontaneous intestinal perforation, SVM support vector machine, K-NN k-nearest neighbor, XGBoost extreme gradient boosting, GBM gradient boosting machine learning, LightGBM light gradient boosting machine learning, MLP multilayer perceptron. www.nature.com/scientificreports/ step was applied (Fig. 4). First, to solve the missing data problem in the given dataset, we imputed the missing (null) values with plausible values. Specifically, input values were categorized into ordinal, continuous and categorical types. In the case of ordinal inputs, we imputed the null values with the mode (most frequently occurring) values, and in the case of continuous inputs, we imputed the null values with the mean values. Finally, missing values of categorical inputs were replaced with the median values. After recovering the data, we mapped the dataset between 0 and 1 by using min-max normalization. Finally, to mitigate the data imbalance problem (e.g., 11,703 negative and 852 positive cases for NEC in Table 5), which is a common intrinsic feature of disease datasets, we oversampled the smaller category (positive cases) and undersampled the bigger category (negative cases), as suggested in previous studies 59,60 . Training. We used the binary cross entropy (BCE) loss to train Model 1, and Model 1 was separately trained to predict NEC, NEC-IP, and SIP. For Model 3, pretrained Model 1 for NEC was further fine-tuned, and the separately updated parameters using the BCE loss were used to predict NEC-IP and SIP.   www.nature.com/scientificreports/ Unlike Model 1 and Model 3, Model 2 was trained in a two-stage manner. In the first stage, the left branch of the network (Fig. 1b) was trained for NEC with the BCE loss. Then, the BCE loss as well as the BCE for NEC in the left branch were used to jointly optimize the network for either NEC-IP or SIP. In both training steps, oversampling and undersampling technique were used to alleviate the data imbalance problem. We employed the two-stage training scheme rather than using the one-shot joint training approach, as we could retain more stability during the training from the two-stage approach.
To train the three proposed models (i.e., Model 1, Model 2, Model 3), the Adam optimizer 61 with a learning rate of 0.0001 was used for loss minimization. We used a dropout rate of 0.2 35 and a batch size of 128 34 . Instead of fixing the number of iterations, we stopped the training using the early stopping technique to avoid the overfitting  www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.